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INTRODUCTION 

The subject of this pamphlet is coincidence. 

“Coincidence” as the term is used here may be defined as a recurrence of a letter in the 
same place, or in a corresponding place, as when two texts are lined up one under the other, 
letter for letter. 

Mathematical evaluation assists the cryptanalyst first in preparing his material for attack, 
and later in the actual attack itself. It assists specifically in answering the following questions. 

1) How much like random, or how different from random, is this text? 

2) How similar are these texts? 

3) How significant is this variation from random? 

4) How is significant is this similarity? 
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I. Simple Monographic Comparison. 

When examining cipher text looking for a break the cryptanalyst keeps in mind as a standard 
for comparison “random” text. This is text which has no meaning or system behind it, in which 
each letter can appear as often as any other, in which each digraph can appear as often as any 
other, and generally no significant pattern can appear except in small samples as the result of 
chance deviations. Text which appears to meet these conditions is sometimes described as 
“flat”. Text which fails to be random in some way is called "rough”. 

A test has been devised which measures whether two texts are rough in the same way. This 
test is performed by writing the texts one above the other and counting the occasions for which 
the same letters come together, such as an E over an E, called a “coincidence”. The ratio of 
the number of coincidences to the number of coincidences expected in random text is called the 

"index of coincidence”, and is abbreviated as I. C. or i. i = - a ^ u ^| c oi n c idences . the two 

expected coincidences 

texts were random then a coincidence would occur once in 26 trials (for a 26 letter alphabet), or 
3.85% of the time. If the two texts were English then there would be more coincidences, almost 
7%. The percentage found divided by 3.85% is the I. C. 

Most European languages have an I. C. of about 2. For random text the I. C. is 1. The 
expected I. C. for English can be computed as follows: 

Take two pages of English text. Make a chance selection from each page. 

There are about 130 chances in 1,000 of the first letter's being an "E” (See table following). 
There are about 130 chances in 1,000 of the second letter’s being an “E”. 

There are about 16,900 chances in 1,000,000 of both letters' being an "E”. 

Likewise, there are 8,464 chances in 1,000,000 of both being “T“, 6,400 chances in 1,000,000 
of both being “N”, etc. 
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Table 



Text Letter 
(Telegraphic 
Text) 


Chances in 
1,000 of 1st 
letter's be- 
ing this ltr. 


1 

Chances in 
1,000 of 2nd 
letter’s be- 
ing this ltr. 


Chances in 
1,000,000 of both 
letters’ being 
this letter 


E 


130 


130 


16,900 


O 


75 


75 


5,625 


A 


74 


74 


5,476 


I 


73 


73 


5,329 


N 


80 


80 


6,400 


R 


76 


76 


5,776 


S 


61 


61 


3,729 


T 


92 


92 


8,464 


D 


42 


42 


1,764 


H 


34 


34 


1,156 


L 


36 


36 


1,296 


C 


31 


31 


961 


M 


25 


25 


625 


U 


26 


26 


676 


P 


27 


27 


729 


F 


28 


28 


784 


G 


16 


16 


256 


Y 


19 


19 


361 


B 


10 


10 


100 


V 


15 


15 


225 


W 


16 


16 


256 


K 


4 


4 


16 


J 


2 


2 


4 


Q 


2 


2 


4 


X 


5 


5 


25 


Z 


1 


1 


1 


Any letter 


1,000 


1,000 


66,930 



Finally, there are 66,930 chances in 1,000,000 (the sum of the chances for the individual letters) 
of both letters’ being the same letter in a chance selection. Therefore, if we select many pairs of 
plain text letters, the average number of identical letters to be expected “in the long run” will be 
6.69% (about 1/15) of the total number of possible coincidences. 



We may call this number the expected coincidences in English text. 



The expected I. C. for English (or monoalphabetic cipher text) is 



. 0669 

.0385 



1.73 



The actual I. C. of unknown cipher text may take almost any value but in practice on small 
samples the range will generally extend from about .80 to about 2.00 (simple monographic Index 
of Coincidence). 

The value of the index of coincidence for a given English text will depend on the distribution 
of letters in that text. Repetitions in short texts will increase the index of coincidence. Text 
with few repetitions will give an I. C. approaching the theoretical 1.73. As the expected number 
of chance coincidences is based on a flat frequency (where each cipher letter is ultimately used 
the same number of times) any cipher text that differs radically from such frequency distribution 



CONFIDENTIAL 



3 



ORIGINAL 



REF ID:A64687 



CONFIDENTIAL 



will have a correspondingly higher I. C. This is especially noticeable in short cipher texts where 
the frequencies have not had an opportunity to “flatten out”. 

The monographic I. C. of English text will increase with small amounts of text to 1.80-2.00 
(as compared with the theoretical 1.73) and small amounts of random text will give I. C.'s of 
1.10-1.20 (as compared with the theoretical 1.00). The amount of excess attributable to the 
sample size will be discussed later, under “standard deviation”. 

For most European languages the expected I. C. is higher than in English, due to the more 
irregular letter distribution of their norma) alphabets, namely: 



Language 


Expected I. C. 


Random text 


1.00 


English 


1.73 


Russian 


1.77 


Italian 


1.93 


Spanish 


1.94 


Portuguese 


1.94 


French 


2.02 


German 


2.04 



II. Polygraphic Comparison. 

In addition to the simple monographic Index of Coincidence, there are occasions when the 
digraphic I. C. (i 2 ), trigraphic I. C. (i»), tetragraphic I. C. (t 4 ), pentagraphic I. C. (t«), etc., can 
be used to advantage They are derived from the normal digraphic (trigraphic, etc.) frequency 
tables in the manner indicated in section I. 

Expected values for the simple digraphic index of coincidence is as follows: 



Language 


l 


i* 


Random text 


wmsmm 




English 


1.73 


4.65 


Russian 


1.77 


3.64 


Italian 


1.93 


5.47 


Spanish 


1.94 


6.15 


Portuguese 


1.94 


5.67 


French 


2.02 


6.28 


German 


2.04 


7.47 



Note: The index might vary widely from this estimate. 

In practice the actual polygraphic I. C.’s will usually run higher than their theoretical values, 
and a repeated word or two in short texts will make them sky rocket. As typical examples, we 
have taken the plain text of four problems in the Navy elementary and secondary Crypt courses 
and computed various I. C.'s (the monographic, digraphic, trigraphic, tetragraphic and penta- 
graphic Indices of Coincidence). 



Text 


monographic 


digraphic 


trigraphic 


tetra 


penta 


Expected random 


1.00 


1.00 


1.00 






Expected plain 


1.75 


4.65 


27.89 


7 


? 


Problem No. 1 


1.80 


5.23 


29.11 


427. 


7240. 


Problem No. 2 


2.00 


7.73 


66.04 


1062. 


14900. 


Problem No. 3 


1.91 


5.60 


42.04 


666. 


12070. 


Problem No. 4 


1.74 


4.90 


31.70 


456. 


9190. 
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III. Theoretical Recapitulation. 

The phenomena just described will be pictured again in general mathematical terms. Im- 
agine text in a language with c letters. Here it may be that c = 10 for digits, or c =24 for 
Greek, or c =26 for English, or c = 50 for Japanese kana. 

Suppose we have a language for which we know the proportions of the letters are pi, p*, , 

P«- 

c 

Pl+PH |-Pe=2pi = l. (1) 



Suppose further that we have two pieces of text from this language and line them up one 
above the other, and then count coincident letters. What is the expected number? 

At a particular place the probability of a coincidence involving the i th letter is pi*. There- 
fore, the c cases being mutually exclusive, the probability of an incidence is 

Pi*+Pi*+ +Pc , =2p i *. (2) 

If the length of overlap is N, then the expected number of incidences is 

c 

N 2 pi* 

1 

If the text is such that pi =p, = l/c, for all values of i and j we will refer to it as “flat”, or 

c 

“random”. The probability of an incidence in flat text is 2l/c* = 1/c, and the expected number 



The ratio of the number g found in a comparison to that expected is called the "index of 
coincidence”, 

i=g / N/c=cg/N. (3) 

The expected value y of the I. C. for our language is given by taking the expected value of 
c 

g, E (g) = N 2 pi*, over the expected value for flat text or 
1 




c 

Pi*=c2 pi* 
1 



(4) 



Notice that the expected value of the I.C. for flat text is 1, since Pi = — is constant. This is the 

c 

smallest value that y can have. The other extreme would be for one letter to occupy every 
position, that is, p, = 1 and Pi =0 for ij*l. Then 

7 =c. (5) 

Thus we have 1 £ y S c. (6) 



IV. Examples of Use. 

(A) TO DETERMINE WHETHER TWO MESSAGES ARE IN THE SAME KEY. 

During U. S. Fleet Problem V (1925) the Battle Fleet used a cipher of their own design. 

A total of 13 messages in this cipher were submitted to the Code and Signal Section for attack. 
Although a different indicator was used in each case, it was suspected that some of the messages 
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might be in the same key. Two messages in one key (example No. 1) and two more in another 
key (example No. 2) were discovered. (The messages were eventually solved). 

Each message was “lined up” with each other message and the coincidences were noted. 
(See examples No. 1 and No. 2). 

Example No. 1 



K 


T 


X 


V 


H 


J 


G 


P 


Z 


J 


W 


B 


J 


M 


F 


S 


U 


G 


P 


N 


s 


V 


s 


0 


P 


N 


F 


D 


N 


G 


R 


0 


A 


A 


0 


R 


G 


P 


z 


E 


z 


G 


J 


F 


R 


z 


P 


S 


0 


I 


u 


I 


Q 


M 


M 


F 


D 


H 


0 


F 


J 


H 


U 


Y 


L 


I 


M 


A 


L 


S 


B 


N 


B 


J 


X 


w 


M 


P 


W 


F 


w 


V 


c 


U 


C 


D 


F 


G 


R 


L 


M 


N 


R 


J 


0 


G 


0 


S 


I 


C 


Y 


U 


G 


U 


D 


I 


M 


D 


C 


K 


w 


z 


p 


R 


P 


J 


L 


E 


R 


R 


V 


G 


P 


U 


B 


X 


P 


M 


ff 


C 


0 


B 


G 


X 


R 


J 


S 


P 


V 


P 


w 


c 


F 


W 


P 


G 


J 


V 


Q B 


K 


L 


A 


G 


P 


A 


D 


X 


Y 


Y 


K 


H 


H 


K 


C 


I 


u 


Q 


P 


Y 


u 


0 


P 


J 


J 


F 


R 


B 


G 


X 


Y 


F 


B 


D 


S 


L 


J 


0 


C 


N 


V 


V 


S 


L 


J 


0 


D 


S 


0 


0 


L 


p 


R 


0 


c 


G 


S 


P 


U 


A 


Z 


B 


N 


0 


P 


0 


J 


N 


Y 


Z 


Y 


T 


Z 


L 


S 


K 


R 


A 


J 


0 


P 


F 


Y 


F 


R 


X 


N 


D 


G 


E 


C 


D 


B 


0 


C 


V 


D 


K 


Q B 


S 


P 


E 


L 


T 


R 


N 


V 


I 


U 






















G Q Q 


H 


N 


L 


Q 


V 


V 


A 


G 


P 


T 


Y 


C 


G 


C 


P 


N 


X 


J 


Q U 


E 


R 


D 


Q 


W 


W Q 



QQUIIVHBKQ 

Coincident letters are underscored. 12 coincidences occur in 140 pairs of letters. 
Simple monographic coincidence. 



Expected N _ 140 

Coincidences c 26 



Where N = number of letters examined. 
c= number of distinct 
letters =26. 



12 0 

IC = _ =2.2 (Messages are in same key) 

5.4 

There is one repeated trigraph, GPZ, in the messages under examination. This coincidence 
indicates that the keys correspond at that point, but does not necessarily indicate that the keys 
correspond throughout the message. To verify coincidence of the keys throughout the two mes- 
sages, we must have our coincidences spread through the messages in question. (As they were 
in the above example). 

Likewise, digraphic and trigraphic coincidences may be evaluated to an index of coincidence. 
For example, in the above messages, 2 coincident digraphs were found (GP and PZ) (also one 
coincident trigraph). In this message there were 139 digraphs and 138 trigraphs in alignment 
with possibilities of coincidence. 

A = 139 = 2 
676 676 



digraphs were to be expected from chance. Two were found, giving an IC 



2 

.206 



9.7. 



This value, far above the normal 4.65 index of coincidence, does not necessarily indicate the 
messages are in the same key because of the smallness of the sample. The extremely high I.C., 
9.70, may be due to the small amount of text involved in this example. As the amount of text 
decreases, the variation of the I. C. from the expected will become more pronounced, until at 
times it is possible that small amounts of text may give entirely false indications. This effect 
will be discussed more fully under “standard deviation” 
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Example No. 2 



A 


Z E 


P 


U 


u 


Y 


C 


N 


0 


Z 


E 


S 


F 


J 


X 


C T 


A 


T M J 


C 


G 


G 


J 


F 


K 


U 


B 


B 


C 


A 


E 


Z 


Q 


Q 


V 


U 


Q 


0 


0 


E 


J 


W 


w 


D T 


Q 


QSP 


F 


W 


C 


T 


0 


H 


P 


Z 


P 


E 


K 


D 


A 


p 


N 


u 


T 


J 


D 


D U 


V 


Q 


Q 


I T 


N 


P E X 


T 


G 


H 


T 


K 


D 


C 


L 


J 


S 


D 


N 


D 


u 


J 


K 


L 


U 


Z 


J 


C 


S 


9 


H 


I o 


Z 


H U K 


E 


G 


X 


D 


E 


P 


w 


T 


Z 


R 


P 


W 


A 


M 


M 


T 


I 


Q 


J 


E 


P 


K 


F 


D 


C 0 


V 


P D U 


C 


H 


z 


W 


X 


M 


G 


E 


P 


R 


F 


R 


X 


P 


I 


Q 


V 


A 


F 


Q R 


P 


F 


E 


A Q 


p 


Q E ff 


0 


Y 


E 


G 


X 


R 


0 


Y 


Q 


V H 


I 


T 


A 


F 


F 


z 


R 


Y 


E 


G 


0 


A 


Z 


B Z 


B 


G D V 


B 


J 


A 


U 


B 


E 


E 


P 


Y 


P 


G 


A 


V 


X 


L 


A 


X 


Q 


E 


0 


L 


J 


0 


u 


B H 


B 


S F I 


J 


W 


D 


D 


J 


P 


S 


S 


N 


M K 


0 


C 


Z 


D 


I 


X 


B 


Y 


A L 


V 


S 


p 


G P 


X 


F N U 


E 


F 


X 


N 


W 


D 


F 


L 


D 


T 


E 


P 


P 


D 


Z 


G 


T 


J 


0 


C 


V 


H 


T 


p 


M R 


P 


H T Y 


C 


Q 


X 


L 


L 


V 


R 


N 


A 


K 


N 


A 


V 


Q 


A 


X 


P 


0 


P 


R 


P 


P 


R 


p 


P Q 


P 


U D B 


L 


M 


N 


J 


Q 


X 


U 


E 


Q 


F 


Y 


A 


X 


G 


0 


L 


U 


P 


Q 


F 


D 


R 


W 


D 


L Z 


w 


I T B 


D 


I 


I 


P 


Y 


Y 


V 


G 


Q 


A W 


S 


Q 


J 


N 


G 


P 


E 


0 


A U 


D 


N 


Z 


A A 


p 


F A Z 


Y 


0 


T 


N 


Q 


V 


V 


X 


U 


M 


N 


G 


c 


F 


J 


F 


L 


Q 


R 


M 


E 


D 


F 


0 


Z N 


s 


PEC 


D 


S 


L 


I 


Z 


Q 


Y 


A 


I 


I 


D 


T 


V 


W 


C 


C 


Y 


B 


X 


0 


Y 


U 


0 


s 


Q D 


N 


0 B A 


S 


H 


H 


C 


T 


D 


D 


U 



HNIXYAOJBK 



There are 14 coincidences in 220 pairs of letters. 

=??? =8.46 coincidences expected. IC = - — - =1.66 (almost normal for English), 

c 26 8.46 

These two messages probably are in the same key (and actually proved to be). Note that there 
are no repeated digraphs or trigraphs. Note also that coincidences are well spread out. 

(B) TO DETERMINE WHERE TWO MESSAGES OVERLAP. 

Two messages may be in depth but not at the beginnings. To place them relative to one 
another is the first problem prior to reading them, and this can be done by means of the I. C. if 
the overlap is sufficiently long. To place them the following or its equivalent should be done. 

Copy each message on a single line on a separate page of paper, omitting all spaces between 
groups and taking care to space the letters uniformly. Then place one page upon the other so 
that letters of one message fall above those of the other. Note the number of coincidences and 
the total overlap. Then shift one message to the right and count again. 

The highest index observed is the best candidate for depth. Since we have examined a 
number of indices some of them may be high by chance, especially for small overlaps. The in- 
terpretation of the results of such a sliding operation requires sophistication beyond the scope of 
this pamphlet, and will not be discussed. The simplest rule of thumb for dealing with it is to 
compare the highest index with the second highest. If they are close they are probably both 
high by chance, since only one at most can be casual; if they are disparate then the highest has 
a better chance of representing a depth. The digraphic and trigraphic coincidences may help 
decide the value of a line-up. 

The process just described is a basic one, and variations of it will always be useful to the 
cryptanalyst. 

V. The Roughness of a Single Sample. 

We have introduced the comparison I. C. as a measure of the match between two pieces of 
text. We can extend this idea now to a measure of the roughness of a single sample. Suppose 
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we have a piece of text which we duplicate on two slips of paper and then place them one under 
the other for the purpose of counting coincidences. There will be one position of total coincidence, 
which we will rule out. If we compute the I. C. for all other positions, we will have what we call 
the “index of coincidence of a single sample”. 



If there are M letters in our sample, we will have looked in N =— ^ distinct places. 



( 8 ) 



N M (M-l) 

If the text were drawn from a flat universe we would expect — = — 2 c” coincidences. That 



is, at --- of the N places there will be coincidences. 



(9) 



If the text is not flat but proportions pi, p*,. . . , p„ of letters, then there are fi=p ; M occur- 
rences of the i th letter. In the course of our counting we will compare every letter with every 



other, so that the i th letter will give rise to ^ coincidences. 



or 



( 10 ) 



c 

S 1/2 fi (fi — 1 ) in all. ( 11 ) 

Comparing this with the expected in flat text we get the I. C. 

(12) 



(13) 



S 1/2 fi (fi — 1 ) 

, or 

^ M (M = l) 



CSfi (fi — 1) 



M (M — 1) 



In theoretical context (12) is more useful than (13), but (13) is a little simpler for computational 
purposes. 

Notice that this formula is different from (4). This is because of the omission of the perfect 
hit. If M is large enough then fi — 1 can be replaced by f i = piM and M-l by M, so that 



c2f s * 

nr > 

We see that 

31 — 1 c 
3 M ~ y M 



(14) 

(15) 



or 



& = 



6 (M— l)=yM— c | , gi\ 



yM— C 
M-l 



or 7 = 



g(M— 1H 



M 



(16) 



From (12) the expected value of £ is E(3) =E 



f Zfi(fi-l). 



1 

^M(M-l) 
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E(Sfi(fi— l)) _ 2e M(M 1) _ 1 
^M(M-l) ^M(M-l) 



From this (16) gives 



M— 1 

E( 7 )=E(«)^p 



c_ 

M 



-H 



c — 1 
M 



The error in using y (usually more convenient) in place of 6 is 



c y c- 
M — 1 “ M ' 



( 17 ) 



This error is always positive, that is, y is an over-estimate of S. 

The error is smaller for larger values of y, or larger values of M. 

Notice that y is a measure of the shape of the distribution only, and is independent of the 
sample size, as 



cSfi 1 c2pi*M* 
y ~ M* “ M* 



= cZp t *, 



(18) 



since fi=piM. 

But S does depend on the sample size M, which is a desirable characteristic, since random 
roughness is usually present in small samples. For smaller samples 



6=y— 



c-y 
M— 1 



(19) 



is seen to be smaller, thus automatically compensating to some degree for small sample errors. 

We will usually measure the roughness of single samples by 6, using 7 as an asymptotic approx- 
imation. 



VI. Examination of Cipher Alphabets and Cipher Texts and Coherence. 

The indices of coincidences discussed in the previous paragraphs may be used in analyzing 
the internal structure of a cipher alphabet. For example, a message of 173 letters has a frequency 
table as given below: 



A B C D E F 


G 


H 


I J K L liNO 


P Q R 


ST UVW XYZ 


6 3 0 14 2 2 


10 


22 


6 4 8 13 1 0 14 


10 0 13 


2 13 0 7 19 1 2 1 


We count the number of times a letter occurs with the same frequency, thus: 








Number of 






Tally 






tallies 


f(f-D 


nf(f— 1 ) 


f 






n 


2 


2 


0 






4 


0 


0 


1 






3 


0 


0 


2 






4 


1 


4 


3 






1 


3 


3 


4 






1 


6 


6 


6 






2 


15 


30 


7 






1 


21 


21 


8 






1 


28 


28 


10 






2 


45 


90 


13 






3 


78 


234 


14 






2 


91 


182 


19 






1 


171 


171 


22 






1 


231 


231 












1000 
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We obtain the I.C. by formula (13): 



Substituting. 

. 1000 x 52 
5 ~ 173x172 



1.75 



CSfi(fi-l) 

M(M— 1) 



The index of coincidence indicates that a monoalphabetic substitution was employed. 

As a second example to show the results obtained from small texts, we calculate as follows 
from a frequency count of 36 letters assumed to be monoalphabetic. 



A 3 
B 

C 5 

D 

E 

F 3 
G 1 
H 1 
I 1 
J 
K 

L 3 
M 



N 2 
0 1 
P 
Q 

R 1 
S 3 
T 1 
U 2 

y 

W 1 
X 4 
Y 4 
Z 



Tally 

f 

0 

1 

2 

3 

4 

5 



n 

10 

7 

2 

4 

2 

1 



f(f-D f(f-l) 

2 n 1 



0 

1 

3 

6 

10 



0 

2 

12 

12 

10 

36 



36 x 26 x 2 
36 x 35 



= 1.49. 



The alphabet in question was actually a monoalphabetic substitution With a small amount 
of text, the simple index is somewhat indeterminate. It is again emphasized that sufficient text 
must be used to give positive indications. 



As another elementary example of the application of the index of coincidence to the internal 
examination of a cipher text, we have a 5-letter repetition at an interval of 85. Is the cipher a 
polyalphabetic cipher of 5 or 17 alphabets? By means of internal examination with the index 
of coincidence we can decide between these alternatives. 

Make a frequency count of the cipher alphabets assuming 5 and then 17 alphabets. Calcu- 
late the index of coincidence in each case for one or more pairs of alphabets. The indices of 
higher value will indicate which assumption is correct. If neither assumption shows positive 
results (an index around 1.7) we may have a cipher of more complex nature. 

Another elementary application is as follows. We have a cipher message which has been 
intercepted. The I. C. is computed and found to be S = 1.79. This is so rough as to resemble 
plain text. A simple substitution has the property of leaving & unchanged, and so has a trans- 
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position. Polyalphabetic substitutions lower the I. C.. So we are reduced to three hypotheses, 
that our sample is either transposed plain text, a simple substitution, or both substitution and 
transposition. 

The digraphic I. C. is computed, 5 t = 4.85. Its expected value is 5i = 3.05 in view of the 
known monographic roughness. Since a transposition destroys coherence the evidence is that no 
transposition is involved. Polygraphic Indices of coincidence are preserved by a simple substi- 
tution. 



VII. The Standard Deviation. 

We have already several times referred to the fact that these statistics are useful only if the 
sample is large enough. To get an idea as to whether this is the case or not we measure our 
results in terms of a standard deviation, “sigma." One standard deviation is roughly one half 
the width of a band which when placed about the average will include two-thirds of the data. 
It is a measure of the dispersion. If sigma is large the data is spread out wide, and if it is small 
the numbers are close together. In a binomial distribution the standard deviation is 6 = VNpq 
where N is the number of observations and p and q are the probabilities of successs and failure. 

To estimate the significance of 5, we refer to (12), where the denominator is the expected 
number of incidences in flat text, and the numerator is the number found. The standard deviation 



of 5 is a (5) = see Appendix III. That is, in the first 



( 20 ) 



example of the previous section 2 = 1.75, and o(&) 



= vCJrPP™- = -04. Thus 5 is = 19 stand- 



173 x 172 ' 04 

ard deviations above the value expected from a flat universe. In the second example of the 



previous section, 5 = 1.49 and <r( 5) = = j|g = ■ 



49 ... 

Thus 5 is -Lj- above the expected value of 1, or 2)4 standard deviations. The significance to be 

attached to this will be discussed later. 



If S is the “sigmage” or the deviation from expected in terms of a we have 

s=- («-i)/^~p 

V 2(c— 1) 2(c — 1) 

M(M-l) 



Using the y I.C. we get 

S 7 ~ 1 - = 

V2(c-1) (M-l~) y/ 2(c — 1) 



( 21 ) 



(22) 



using the results of Appendix III. 

In either case approximately 

S = - 7-1 -M or S = - 5 ~ 1 — M. (23) 

y/ 2(c — 1) V2(c-1) 

with an error introduced by this approximation of less than 1% for M > 51. 

Notice that the sigmage is approximately a linear function of the sample size M, and also 
linear with the "bulge” 5 — 1. The denominator is relatively unimportant to the estimation of S 
except in shifting from code to cipher, when c can change from 500,000 to as small as 10. 
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The bulge 0 = 8 — 1 is a quantity which will recur frequently. 

Formula (21) does not apply to the comparison I.C. For that we have the expected num- 
ber N/c and g found. 

Then o-* = N/c(l — 1/c) and the sigmage is 



g-N/c 

VN/cd-l/c 



, - K gL-i tzl VN. 

\T C — 1 y/c — l 

N 



(24) 



In this case the sigmage is linear with the bulge i — 1, but varies only as the square root of the 
sample size. It is important to use formulas (21) and (24) in the right contexts, and not confuse 
them. 



The significance of S(3) is given in the table of Appendix VI, which lists the probability of 
getting S(5) or a larger result from chance. 



In Appendix VI it is shown that the probability of S attaining a limit, Prob (5^ io), is the 
same as the probability of the sigmage S attaining another related limit, Prob (S^So). The 
advantage of using sigmage is that the probability is independent of the sample size, a great con- 
venience in tabulating. The tables of Appendix VI enable one to estimate the significance of a 
sigmage as a probability. 



If a probability Prob (S ^Sg) is not in the table of Appendix VI then the entry P 



c — 1 b 
2 ' 2 

. 



is a good approximation, b = S 0 v / 2(c— l)+c— 1, where P(x,a) is the Poisson cumulative function, 
tabulated in the Cryptanalyst’s Manual, Section 6-1, Table II, published by the National Security 
Agency. If the probability is too small even for the Poisson tables then it is closely approximated 

k — 

by Prob (S^k) = a — where a = 



For the comparison I.C. a different table must be used. Since an< * the distribution 

N 

of g is binomial, then Prob (g^k) is given by P(k,-^-) the Poisson distribution, and the ap- 

N N 

proximation is good if c is large. From this Prob (i2:k) ?P(k — , — -). Entries too small to be 

c c 



x -a 

in the table are closely approximated by — — , where a =-^- and x =k-~ . 

Example: A cipher message is suspected of being in depth with itself at interval 676. It 
is 1352 letters in length, giving an overlap of 676. Thus there are 26 coincidences expected from 
chance between the first and second halves. There are 50 observed. What is the significance 
of this? The Poisson table shows that this many or more will occur .000013, or only about 
once in a hundred thousand such experiments by chance. 



VIII. The Cross I. C. 

Suppose we have two stretches of text of which the distributions are given by pi, p*, 

p« and qi, q*, , q„ 

c c 

2pi=l, 2qi = l. 

1 1 
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If these texts are placed one above the other the probability of an incidence at any one posi- 
tion is 

c 

S Pi qi (25) 

1=1 

and the expected comparison I.C. is 

{ = c2Jpiqi. 

We can show that the expected value of £ is 1. 

If one of the samples of text is flat, say qi = 1/c, then 

f = l, (26) 

for 

£ =cSpiqi = c/c2pi = lXl = 1. 

If both samples are rough, then £ fluctuates widely as the q’s are permuted. The question arises, 
how wide is the distribution of £. 

A standard measure is the “variance” 

<r * = E(£*) — [E(£)]*. (27) 

From Appendix II we conclude 



. (P-1)(Q-1) 
“ c — I 



(33) 



where P =c2pi* and Q =c2qi* are the I.C.’s of the two texts. 

This says that the square of a standard deviation a is the product of the bulges over c — 1. 

£-1 



S = - 



V(P-1)(Q~1) 
c — 1 



= (£-l)v/ 



c— 1 



(P-1)(Q-1) 



is a measure of the significance which can be judged from the table of Appendix VI. 
The function £ can be used as a measure of correlation between two distributions. 



IX. The Coincidence Test Used to Align Secondary Alphabets into a Primary Alphabet. 

We give here a special application of the I. C. statistic. An actual problem from an ele- 
mentary course is used. Special frequency distribution tables were made of the sample lined up 
into 26 columns, i. e., lines of 26 letters each. 

This problem happens to be enciphered by means of a Vigen&re table, the columns being used 
in succession. Consequently if the cipher text is lined up 26 wide each column is enciphered by 
a monoalphabetic substitution. Each alphabet is a slide on that in the next column. If we knew 
the plain and cipher sequences the text could be decrypted. The problem is to recover these 
sequences. Since the sequence is not alphabetical, adjacent frequency counts as given in table 
C appear unrelated. But if we look at the rows they must be related by being slides of each 
other. If we can establish these slides we will have the cipher sequence. 

By use of this special table, table “C”, we can build up the cipher component used by match- 
ing these frequency distributions. In selecting distributions to match we want to obtain rows 
which have: 

(a) A maximum total number of letters involved (as the distribution will then be more re- 
liable). 

(b) A distribution with a normal count (i. e., similar to English). 
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Table "C" — Frequency Table of the Cryptogram 



Column - 
Cipher - 

A 


1 


26 


25 


24 


23 


22 


21 


20 


19 


18 


17 


16 


15 


14 


13 


12 


11 


10 


9 


8 


7 


6 


5 


4 


3 


2 


Total 

Letters 

25 


i 


3 




3 








2 




5 




3 










~ 1 


" 




1 




; 




i 


2 


“1 


B 




n 


mi 


Ml 




K3 


K3 


M ! 


H 








MJ 


m 




m 




Ml 


Ml 


ma 






mm 




Ml 


ma 


36 


C 








Ml 


Ml 















Ml 




m 


m 






Ml 


mm 






Ml 






Ml 


Ml 


18 


D 










Ml 




Ml 






MJ 


Ml 


Ml 




Ml 






MJ 








Ml 




Ml 




Ml 




22 


E 






ma 


m a 


MJ 








MJ 




MJ 






Ml 




Ml 


Ml 


MJ 


_ 






Ml 


El 


Ml 


Ml 




25 


T 








MJ 


-D 






Ml 


MJ 


MJ 


ZJ 










Ml 


Ml 




u 




Ml 





l 


MV 


i 


Ml 


14 




ma 


■D 


rz 




Ml 


MJ 




Ml 


Ml 


MJ 


Ml 




Ml 




Ml 










ma 


Ml 


EM 




MV 


Ml 


Ml 


26 




mm 





ma 




ma 


Ml 












MJ 




Ml 


i 


MJ 






mm 




MV 


MV 


IV 






Ml 




i 




K9 






ma 






Ml 




Ml 


Ml 






Ml 


Ml 


ZZ 


Ml 


Ml 








El 








D 




j 


ZZ 


ZZ 


ma 


Ml 


MJ 




MJ 


ZZ 








Ml 






MJ 




MJ 


Ml 






mm 






MV 


Ml 




22 


K 


EM 


ma 


rz 




ma 






Ml 


MJ 






MJ 


ma 


MJ 


Ml 


MJ 




Ml 








Ml 






Ml 




- ’57 


L 


ZZ 




Ml 








ma 








Ml 


Ml 








MJ 






K9 




MJ 






MV 


Ml 


Ml 


20 


H 


u 




Ml 


MJ 




Ml 


Ml 


Ml 


MJ 




Ml 








MJ 






MV 




»i 




i 




MV 


[Z 


MV 


22 


« 


1 


ma 





i 








Ml 


Ml 








ma 


MJ 


Ml 


lZ 




ZZ 


mm 


D 




mm 




MV 


O 


L 


24 


0 


MV 




MJ 


Ml 








Ml 


Ml 








Ml 






m 


Ml 


MV 







MV 










Ml 


16 


P 




ZZ 




MJ 


MJ 








Ml 








zz 




Ml 








ZZ 


ma 


Ml 








Ml 




17 


Q 




_ 




Ml 






Ml 








MJ 


Ml 


r~ 










l 


MV 


ma 






MV 






ma 


16 


R 




K1 


Ml 


ma 




Ml 


LZ 


Ml 


Ml 


Ml 


ma 




ma 




Ml 






Ml 


mm 








Ml 






Ml 


24 


S 


mm 




Ml 




Ml 


Ml 


ZZ 






Ml 


zz 




ma 






ma 








a 




MV 


O 






ZZ 


21 


T 




ma 


MJ 




Ml 














Ml 


ma 


Ml 






Ml 







D 






MV 


Ml 




Ml 




0 


M 


r~ 


ZZ 


Ml 


MJ 


MI 


Ml 


Ml 


Ml 


~ 


Ml 






Ml 








Ml 








mm 




zz 








V 


n 


ma 




L . 


Ml 






Ml 


Ml 




L 


Ml 




Ml 


Ml 


Ml 





Ml 


mm 






MV 


MV 


zz 


Ml 


Ml 




w 




K1 


Ml 


Ml 




Ml 


ma 




zz 




Ml 










Ml 


Ml 




mm 


ma 


Ml 







a 


l 




13 


X 


u 


j 


ZZ 




Ml 




ma 






Ml 




MI 




Ml 







Ml 




mm 




Ml 




IV 


Ml 


a 




17 


y 


^z 


ma 








Ml 


Ml 


MJ 


Ml 


Ml 




ma 






Ml 








Ml 




MV 




MV 




MV 




19 


z 


n 


n 


ma 





z: 


Ml 


ZZ 


L_ 


z: 


MJ 




Ml 


__ 


Ml 


Ml 




Ml 




mm 


IM 


MV 




— 


MV 


— 





15 



CONFIDENTIAL 




. CONFIDENTIAL - 15 ORIGINAL 



REF ID : A64 687 



Table "D” — Sliding Strips 



: Plain 


1 


2 


3 


4 


5 6 


7 


8 


9 10 11 12 13 


14 15 16 


17 18 


19 


20 21 


22 23 24 25 


26: 


"B" Master Distribution : 


: Cipher 


B 


























a 

• 




: Freq. 




1 


3 


1 


4 


4 


1 


2 


5 5 


2 


3 


1 


1 1 


2: 


131 441 25 : 



2 1 



4 3 1 



: T 



13 2: "T” set at 2 on the "B" 

; Master Distribution. 



: 1 3 

: A 



3 2 



1 2 



3 3 



4 3 1 



1 2 



2 2 



Plain : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26: T-l N-5- Master 

Cipher : T N : Distribution. 



Freq. 



4 3 



1 1 



7 6 1 



7 3 15 



2 5 



4:4 3 



1 1 



7 6 1 



13 1 4 4 1 



25 5 231 112: 



:B 



1 3 



5 3 2 



12 1 : 
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Table "E " — Table of Total Coincidences 
B ■ Master Distribution 

Plain -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 

Clpher-B 

T -X 43 49 — 39 53 41 40 37 — 60 64 

A -X 38 66 — 59 42 50 — 42 

N -X 53 40 43 39 63 



NOTE; The numbers 43, 49, etc., represent the successive nuinbers 
of ■ coincidences", i.e. , the sums of products of 
frequencies at successive points of coincidence. 

T a Master Distribution 



Plain -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

Cinher -T 

A -X — 32 28 29 — 42 31 — 51 — 43 

N _x 28 50 — 28 26 34 34 31 



— 32 34 



A a Master Distribution 

Plain -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 

Cipher -A 

N -X — 26 28 26 — 43 — 34 27 27 



24 25 26 



T - 1 -- 



Master Distribution 



Plain -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 

Cipher -T N 

B -X X — 100 123 - 80 78 74 82 — 75 

A -X 57 56 — 61 65 — 94 — 65 



24 25 26 



73 89 96 
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Table "F " — Table of Coincidences 
Master Distribution T-l N-5 B-ll A-17 



Lain 


1 


2 


3 


4 


5 


6 


7 


8 


9 


l 0 


ii 


12 


13 


14 


j 15 


18 


17 


18 


19 


20 


21 


22 


23 


124 


25 


26 


Chance 

Coin- 

ciden- 

ces 


Lpher 


T 

T 


V 


I 


D 


N 

N 


Y 


U 


F 


P 


M 


B 

B 


K 


H 


Q 


z 


E 


A 

A 


L 


G 


X 


R 


J 


W 


S 


C 


0 


K 


l< 


HE 




- 


E9 


— 


- 


— 


— 


— 


El 




- 


- 


- 


- 


El 




IB 


EH 


B 


^3 


ES 


El 


El 


<bl- 




mm 


n 


351 


— 


- 


El 


— 


— 


n 




19 


VI 


VI 


- 


- 


- 


- 


El 


- 


- 


- 


- 


HE 


- 


- 


- 


XE 


120 


MS 


o 


Q 


- 


TE 


El 


- 


- 




- 


- 


El 


VI 


- 


- 


- 


- 


FI 


- 


- 


- 


- 


HE 


- 


~ 


- 


_ 


94 


E 


14 


>4 


QBE 


- 


II 


gggi 


IB 




Qg 


a 


El 


a 


- 


- 


- 


HE 


O 


- 


- 


- 


HE 


- 


m 


■9 


El 


- 


102 


W 


O 


El 


— 


- 


O 


- 


- 


- 


- 


HE 


El 


El 


- 


- 


- 


El 


El 


- 


- 


- 


- 


- 


332 


- 


- 


- 


102 


G 


a 


O 


- 


- 


D 


- 


- 




- 


- 


El 


o 


- 


- 


HE 


El 


D 


- 


HE 


- 


lEH 


- 


O 


- 


- 


- 


107 


R 


u 


El 




- 


E4 


- 


- 


- 


<¥¥ 


- 


El 


El 


- 


- 


- 


El 


El 


- 


El 


- 


'BE 1 


- 


El 


- 


- 


- 


102 


M 


□ 


El 


'BE! 


- 


E4 


- 


- 


- 


- 


iHr 


El 


El 


- 


HE 


- 


El 


El 


- 


El 


- 


FI 


- 


El 


- 


- 


- 


94 


J 


El 


El 


S3 


- 


O 


IB 


B 


E9 


02 




El 


El 


- 


- 


- 


El 


El 


- 


El 


- 


El 


XE 


El 


El 


- 


- 


mm 


L 


El 


El 


- 


- 


El 


HE 


- 




- 


- 


O 


El 


- 


- 


- 


El 


El 


002 


El 


- 


El 


El 


El 


El 


- 


- 


mmtm 


S 


11 


El 


- 




El 


- 


- 


HE 


- 


- 


El 


El 


- 


- 


HE 


El 


FI 


- 


El 


- 


El 


FI 


El 


IEl 


- 


- 




c 


O 


El 


XE 


- 


O 


- 


- 


- 


- 


- 


FI 


El 


- 


- 


- 


VI 


El 


- 


El 


- 


El 


El 


FI 


H 




- 


81 


u 


O 


El 


- 


- 


O 


- 


XE 


EE 


- 


- 


FI 


El 


- 


- 


- 


El 


FI 


- 


El 


- 


El 


El 


FI 


FI 


El 


- 


81 


Y 


a 


El 


- 


- 


El 


< 


- 


EE 


- 


ED 


El 


El 


- 


- 


- 


El 


El 


- 


Ffl 


- 


El 


El 


F« 


El 


El 


El 


8l 


I 


ra 


El 


El 


- 


El 


El 


FI 


ED 


- 


- 


x 


T 


EE 


- 


- 


El 


El 


FI 


E9 


- 


- 


FI 


El 


El 


El 


El 


77 


H 


o 


El 


El 


- 


O 


El 


El 


- 


- 


- 


El 


El 


HE 


- 


ED 


El 


El 


El 


FI 


- 


El 


El 


El 


El 


El 


EE 


73 


P 


o 


El 


□ 


- 


E9 


El 


O 


- 


'IE 


- 


FI 


El 


El 


- 


- 


El 


B 


El 


El 


XE 


O 


El 


El 


El 


El 


- 


73 


Oil 


a 


El 


a 


- 


El 


El 


El 


XT 


El 


- 


El 


El 


El 


- 


- 


El 


B 


F« 


El 


XE 


El 


El 


El 


El 


E« 


- 


73 


0 


o 


El 


El 


- 


O 


El 


El 


- 


El 


- 


El 


- 


O 


El 


- 


- 


- 


El 


El 


El 


El 


El 


El 


El 


El 


XE 


68 


q 


a 


El 


El 


- 


El 


El 


FI 


- 


El 


EE 


FI 


El 


El 


HE 


- 


El 


El 


El 


F« 


El 


El 


El 


El 


El 


El 


El 


68 


z 


El 


El 


El 


- 


El 


El 


El 


- 


El 


El 


E4 


El 


El 


FI 


'IB! 


El 


El 


El 


El 


El 


El 


El 


El 


IE1 


El 


El 


64 
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El 


El 


EE 


a 


El 


El 


EE 


El 


El 


o 


El 


El 


El 


El 


El 


El 


El 


El 


El 


El 


El 


El 


El 


El 


El 


60 
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(c) A distribution without any one letter of abnormal frequency (as this gives too much 
weight to one letter). 

When frequency distributions of two letters are properly matched, high should pair with 
high, low with low, blank with blank, etc. The mathematical value of each relative position is 
found as follows: 

(a) With the two frequency distributions in question written on paper strips and slid one 
against the other, for any one position we multiply the frequencies in alignment, and then 
add the products of all these multiplications, (see table “D”). 

(b) Cross-multiply the total count of the first distribution by the count of the second. 
This is the total number of possible pairs of letters. Chance would produce one coinci- 
dence in twenty-six. Therefore, divide this product by twenty-six, which gives the 
number of expected chance coincidences. 

(c) Divide the number of the actual coincidences by the expected number of chance coinci- 
dences. The resulting number is the Index of Coincidence. 

To prove correct alignment: 

(a) The index for the given relative position of two distributions must be higher than for 
all other positions, with no close second. 

(b) The index should be 1.50 or higher (preferably 1.73 or higher). 

(c) There must be only one acceptable alignment. 

Indeterminate results will be encountered in some cases, particularly with insufficient text. 
Table “E” gives the coincidences at various positions of one strip slid against another. 

From table "C”, it is seen that certain distributions have the following properties (refer- 
ring to our three desired properties) : 

Row B (cipher) has a total of 36 letters, with 14 different letters involved. Its highest 
frequency is 5. Good. Approaches normality. Maximum text. 

V (cipher) has a total of 27 letters with 15 different letters involved. Its highest 
frequency is 4. Not good — too flat. 

K (cipher) has a total of 27 letters with 13 different letters involved. Its highest 
frequency is 5. Not good — too fiat. 

G (cipher) has a total of 25 letters with 15 different letters involved. Its highest 
frequency is 5. Not good — too flat. 

D (cipher) has a total of 22 letters with only 10 different letters involved, but its high- 
est frequency is 7. Not good — too peaked. 

T (cipher) has a total of 26 letters with 11 different letters involved. Its highest 
frequency is 5. Good. Approaches normality. 

A (cipher) has a total of 25 letters with 11 different letters involved. Its highest 
frequency is 5. Good. Approaches normality. 

N (cipher) has a total of 24 letters with 11 different letters involved. Its highest 
frequency is 4. Good. Approaches normality. 

B, T, A and N are the best rows. Match T, A, and N against B, then match A and 
N against T, finally match N against A. One of these combinations sould give a posi- 
tive index of coincidence, and thus serve as a starting point. 

Building Up The Cipher Component 

By utilizing the principles described in the previous sections, we can build up the cipher 
component. Take B (cipher) as the master distribution, since it is acceptable and contains the 
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highest count. Copy the frequencies of B (from "C”) at the bottom of a strip of paper, and re- 
peat this sequence to the right. Over the first sequence write the numbers 1 to 26 as shown in 
table “D”. Under No. 1 write the letters “B”. The numbers represent the various unknown 
letters of the cipher component. Make similar master distribution strips for T and A. Next, 
copy the distribution of T (cipher) (from table "D”) at the top of a strip of paper. Only one 
sequence is required for this strip, and the numbers are omitted. Indicate the space correspond- 
ing to column No. 1 (table “D”) by the letter T (see table “E”). In a like manner make strips 
for A and N. 

Note: The letter on each strip is an indicator to mark column No. 1 for that letter. When 
strips are properly aligned, the indicators show the relative positions of these letters in 
the cipher component. The student is advised to prepare strips for himself and follow 
these processes. 

First, match T against B. As no two letters can occupy the same position in the cipher 
component, begin by setting the T indicator at No. 2 on the B master alphabet. Count the 
coincidences. Next, slide T to No. 3, and count the coincidences. Continue this process to No. 
26, and record the successive numbers of coincidences in tabular form (see table “F”). In many 
cases lack of sufficient coincidences will be obvious by inspection and the count need not be made. 
In this way we discover that B and T give high indices of coincidence in two different alignments 
(indices computed in accordance with rule in page 30). 

Index of B (1)-T (7) — = 1.77 (good) 

36 

Index of B (1)-T (11) ~ = 1.67 (good) 



All other alignments give such low indices that they can be at once eliminated. The above two 
indices, however, are both high enough to be significant, and as the second is so close to the first, 
it cannot be disregarded. 

There can be only one acceptable point of coincidence; therefore, it is necessary to match A 
against B, and N against B, to see if more conclusive results can be attained. 



Index of B (1) —A (5) 


I - 189 


(excellent) 


Index of B (1) —A (7) 


l- 1 - 69 


(good) 


(Other alignments are eliminated) 




Index of B (1)-N (21) 


“ = 1.91 
33 


(excellent) 


Index of B (1) -N (6) 


= 1.60 
33 


(fair) 



(Other alignments are eliminated) 

Since there is no outstanding coincidence with “B” as the “master alphabet’', try "T” and 
“A” as the "master alphabet” : 

Index of T (1) -A (17) ^ = 

25 

Index of T (1) -A (19) ~ = 



2.04 (excellent) 
1.72 (good) 
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Index of T (1) — A (11) 

Index of T (1) — N (5) 

Index of T (1) — N (12) 

(16) 

(26) 

Index of A (1) — N (15) 

Index of A (1) — N (17) 

The most certain combination is T (1) — N (5), and there is no doubt as to its correctness. This 
located "N” relative to “T” in the cipher component, and allows us to consolidate their frequ- 
encies. 

For a new master distribution add the frequencies of T (at space No. 1) to those of N (at space 
No. 5) (see table “E”). Match "B” and "A” against this new master distribution: 

(good) 

(poor) 

(poor) 

(excellent) 

(very poor) 

(very poor) 

"B” and “A” can now be consolidated with “T” and “N" for the final "master distribution” as 
follows: 



190 

Index of T (1) & N (5) -B (11) " = 1.81 

69 

Index of T (1) & N (5) -B (7) — - 1.45 

69 

Index of T (1) & N (5) -B (26) ^ = 1.40 

05 / 

Index of T (1) & N (5) -A (17) — = 1.96 

48 

Index of T (1) & N (5) -A (15) — = 1.35 

48 

cc 

Index of T (1) & N (5) -A (19) 5? = 1.35 

48 



— = 1.68 (good) 

~ = 2.08 (excellent) 

34 = 1.42 (poor) 

24 



— = 1.87 (good) 
23 

— = 1.48 (poor) 
23 



Plain 


- 1 


2 


3 4 


5 6 7 8 


9 


10 11 


12 


13 


Cipher 


- T 






N 










T&N 




4 


3 


1 1 






7 


6 


B 




2 


3 1 


1 


1 


2 


1 


3 


A 




3 


2 






1 


2 





Plain - 14 15 16 17 18 19 20 21 22 23 24 25 26 



Cipher — 

T&N 1 7 3 15 

B 1 4 4 1 

A 12 13 3 



2 5 4 

2 5 5 

2 5 
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This locates “B” and "A” relative to "T” and "N” in the cipher component, in addition to giving 
the combined frequencies of all four letters. 



FINAL MASTER DISTRIBUTION 



Plain 


-12 3 


4 5 6 7 8 9 


10 11 12 13 


Cipher 


- T 


N 


B 



Comb. — 
Freq. 


9 


8 


1 


1 


1 


1 


1 


3 


10 


9 ! 
























Plain - 14 
Cipher — 


15 


16 


17 

A 


18 


19 


20 


21 22 


23 24 


25 


26 


Comb. — 3 
Freq. 


2 


4 


12 


7 


1 


8 


4 


12 




14 



Analyze the preceding steps. T gave two possible alignments with B, and, as we now see, 
the incorrect position gave the higher index. N also gave two possible alignments with B. (B 
was at fault due to its erratic letter distribution). However, when T and N are combined, giving 
twice as many letters in the master distribution, B fitted in with only one possible alignment, 
adding B and A gives twice as many letters in the master distribution and this should make 
future results even more positive. The master distribution (of 100 letters or more) should ap- 
proximate a normal frequency distribution and will give a standard to which all the other dis- 
tributions can be referred. Hereafter, variations in the highest index of coincidence will be due 
entirely to letter distribution of the various distributions themselves. 

For example: 

If the highest index is 1.7 - letter distribution is normal. 

If the highest index is 2.0 — high frequency letters predominate. 

If the highest index is 1.4 — the intermediate and low frequency letters predominate. 

Therefore, when matching the remaining letters, we can accept the highest index of coincidence 
as establishing coincidence, unless the second highest is too high. 

Continue the matching process and the reconstruction of the cipher component, noting that T, 
N, B and A are already located and thus may be deleted at once from further test. Begin with 
the letters of the highest frequency, as they should give the most positive results. When a letter 
is placed, delete this location from further test, as two letters cannot occupy the same space in 
the cipher component. 

• Letters are added to the cipher component in the following order (see table "F”) : 

“Master alphabet” T (1) — N (5) — B (11) — A (17) 

> K (12) 1.46 (poor - but acceptable) 

V (2) 1.51 (poor -but acceptable) 

D (4) 2.05 (excellent) 

D (8) 1.76 (good) (D -not certain) 

E (16) 1.86 (excellent) 

W (23) 1.96 (excellent) 

G (19) 1.80 (good) 

Note: With this many values a key-word (if any) sequence could be completed by inspection. 
In this case, the partially reconstructed cipher component gives no suggestion of a 
key-word sequence. 
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R 


(21) 1.55 


(fair but acceptable) 


M (10) 1.65 


(good) 


M (14) 0.53 


(M— not certain) 


J 


(22) 1.89 


(excellent) 


L 


(18) 1.71 


(good) 


L 


(6) 1.60 


(fair) (L— not certain) 


S 


(24) 1.77 


(good) 


c 


(25) 2.12 


(excellent) 


u 


(7) 1.47 


(poor but acceptable) 


Y 


(6) 1.89 


(excellent) 



Note: This throws out L (6), but leaves L (18) as correct. 

I (3) (fair) 

1 (13) 1.42 (poor) (I— not certain) 

H (13) 1.63 (good) 

Note: This throws out I (13) and leaves I (3) as correct. 

P (9) 1.95 (excellent) 

X (20) 1.90 (excellent) 

O (26) 1.72 (good) 

Q (14) 1.59 (fair and acceptable) 

Note: This throws out M (14) and leaves M (10) as correct. 

Z (15) 1.77 (good) 

F (8) 1.45 (poor but acceptable) 

F (4) 1.20 (very poor) 

F (8) is correct and D (4) is correct 

The cipher component has now been completely recovered. 

Note: The process described above has actually built up the complete squared-cipher- 
table of a modified Vigen&re table (it remains only to recover the plain component 
to complete the Vigen&e table). We have written down the cipher component 
rather than the complete square table merely to save time and effort. 

X. The I. C. of the Modular Sum of Two Streams. 

Consider two streams, one of I. C. yi = c 2 qj*, the other of I. C. y* = c 2 ri*. If these 
streams are added letter by letter mod c, what is the expected value of the y I. C. of the sum, 
assuming independence of the two streams? 

It is y=l+— — — — . If we write /3 =y—l, (34) 

the bulge, then 0=^^. For proof see Appendix IV. (35) 

Example: A certain cipher system works as follows. The clerk picks a verse from the 
Bible. He copies this verse and succeeding verses until he has enough for his purpose. Then 
he copies underneath it the plain text of his message. Then using a Vigendre square he com- 
bines the two, getting cipher text. This is "modular” addition. 

The cipher text will have a certain roughness. If the I. C. of his plain text is 1.7, and that 

7x8 

of the Bible is 1.8, then the bulge of the cipher will be ■■ = .02 approximately. The I. C. 

aO 

of the cipher text is expected to be 1.02. 
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XI. The Roughness of Mixed Texts. 

What happens to the I. C. when the text is made up of letters coming from two distribu- 
tions? To fix our ideas, suppose that there are two streams of text available with I. C.s 1.0 and 
2.0 respectively. Then a new text is made up by taking a letter from one and then a letter from 
the other, repeating this until a sufficient sample is obtained. What should the I. C. of the mix- 
ture be? 



The answer is 1.25, which we can derive quickly this way. Suppose the frequencies of the 
first stream to be qi, q«, , with c2qi l = l, and those of the second stream to be Si, Si, with 

c£s i , =2. Then the frequency pi of a letter in the mixed sample is pi =-g-qi+-^Si. Then 

7 =cZp, s =c2(-^±^-)*=-j-(Ss,*+22:siqi+Sqi*) 

= y t x l+jl i+y< x2 = H £+%, where £ is the cross I.C. of the two components. The expected 
values of the cross I.C. is 1, as in VIII, so the expected value of y is x l+y t = 1 y*. 

A similar argument (given in Appendix V) shows that if there are mixed a number k of texts 

k 

with bulges ft, each present in the proportion 71 where 2 71 = 1, then the expected bulge of 

i=l 

k 

the mixture is 7 — 1 * 2 7i*0i. (36) 

i = 1 



That is each contributes to the roughness according to the square of its presence. This 
is true only if the k distributions are unrelated. 

Example: Suppose a body of traffic consisting of 25 messages from a cipher system which 
for a given key gives an expected I.C. of 1.04. However, the messages are actually in two different 
keys, 10 from one key and 15 from another. What is the expected I.C.? 



E( 7 ) = l+.04(|-)*+.04(|-)* 



=l+.04(^) =l+.0lg=1.02 



XII. The Relation Between Chi-Square and the Gamma I.C. 

The chi-square test is a standard statistic for comparing two distributions. It is more flex- 
ible than the I.C. and its use has been described in many places. It is also more tricky than the 
I.C., its very flexibility making it hard for a beginner to use intelligently. 

If f j and gi are the frequencies of corresponding objects in two counts of the same size 

N= 2 f,= 2 gl , 
i =1 i=l 



then x ” = 



T (fi-gi) 8 

ifl s 



is the usual definition of the chi-square test. The frequency gi may be the expected value of fi 
from some universe with probabilities p it so that gi =E(fi) =piN. Then (37) becomes 



c 

x «= s 

i = l 



(fi-PiN)’ 1 
PiN ~ N 




(38) 
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All the predictions for the I.C. are made from the flat universe for which pi = 1/c. Then (38) 
becomes 

x“ = §Zf,*-N=N T -N=N(7-l). (39) 

This simple relation shows that under the assumption used, that the universe is flat, the chi- 
square test and the I.C. are essentially the same thing. The delta I.C. has a fixed expected value, 
1, which does not depend on the elusive number of degrees of freedom, and this is somewhat of 
an advantage. The probabilities of various scores, the "distribution”, for the I.C. can be devised 
from that of chi-square. If the use of the chi-square test is indicated, as when the probabilities pi 
of the universe are not all equal, the cryptanalyst is referred to to the paper "The Chi-square 
Test” by Greenwood, Lotz, and Barrett, AFSA-34, 14 March 1952. 

XIII. The I.C. of a Rectangular Array. 

It frequently happens that the cryptanalyst makes counts in a rectangular pattern, as for 
instance a set of frequency counts on a cipher using several alphabets, one column for each alpha- 
bet, one row for each letter. After the counts have been made the question arises, is this array 
significant? The I.C. can be applied to each column, or to each row, but the numbers thus ob- 
tained will not be independent, nor will they reflect recognizably the interdependences. A number 
of ways of dealing with this problem have been used, and no one method is outstanding. The 
cross I.C. or the chi-square between columns are sometimes used. The student is referred to 
"Statistics for Cryptology,” AFSA 14, 1 December 1951, by H. Campaigne. 

One treatment worthy of mention here is due to Dr. H. Gingerich. It is aimed only at 
measuring the information contained in a rectangular array, and not at extracting the information. 

Suppose the count in the itfa row and jth column is f». Let t, be the total for the jth column, 
n 

tj= 2 f jj. Compute the delta I.C. of each column. To do this first compute 
i = l 

Nj= 2 f„(f„-l) (40) 

i = 1 

and Dj=-~ tj(tj — 1). Then (41) 

Jj=XNj/xD j =Nj y /Dj. Now form 
w 

Z Nj 

5 = , a sort of average of the I.C.’s. This is seen (42) 

Z Dj 

j=l 

to have the expected value E( j) =1, since E(N } ) = Dj, and it 

^ c — l ^ 

can be shown to have the variance 2 af =— 5 — 2 Dj. (43) 

j = l * J = 1 

XIV. Conclusion and Problems. 

The material presented here has been collected in the hope of helping the cryptanalyst 
with his decision problems. More on this fundamentally sophisticated subject can be found in 
“Statistics for Cryptology” and "Probability and the Weighing of Evidence” by I. J. Good. 
Many papers on specific phases of decisions are to be found in the "Collected Papers of Mathe- 
matical Cryptology” or in the "Quarterly Summary” of NSA-34. 
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Problem: A cipher text autokey is a cipher system in which the next cipher letter is deter- 
mined by the last cipher letter and the present plain letter. What is the expected digraphic 
I.C. 7 * of the cipher text? 

Problem: Some cipher machines have a switch which must be thrown one way to encipher 
and the other way to decipher. The switch reverses the direction of flow of information through 
the machine. A certain operator inadvertently uses decipher on his machine in preparing a 
message, and sends it that way. The recipient is unable to understand it and requests a resend. 
The operator repeats his operation, using the same setting as before, but with the switch now on 
encipher. You, the cryptanalyst, compare the two texts letter for letter. What comparison 
I.C. t do you expect? 



•C ONFIDENTIAL 



25 



ORIGINAL 
Reverse (Page 26) Blank 




CONFIDENTIAL 



REF ID:A64687 



APPENDICES 



CONFIDENTIAL 



27 



ORIGINAL 
Reverse (Page 28) Blank 




CONFIDENTIAL 



REF ID:A64687 



APPENDIX I 

The Relation between the Monographic and Digraphic I. C. 



The monographic and digraphic I. C.'s are not independent. For if the probabilities of the 

individual letters are p J( p it , p, then the expected probability E(pij) of the digraph ij is pipj, 

ignoring the cohesion of the language, and for the moment imagining a sample like newspaper 
which has been cut into little pieces, one letter to a piece, which have been shuffled and arranged 
in a line. Using this estimate of pu we get the expected value 



E(7.)«E 



c c 
c* 2 2 

i-lj-1 




c c c c 

= c* 2 2 pi^j^c 2 pj*c 2 pi* = 7 *. 

i =1 j = l i = l j=l 



It is true however that language does have cohesion and that each letter affects the frequency 
of occurrence of others in its vicinity. Usually then the digraphic I. C. is in excess of the esti- 
mate 7 i above. We will sometimes calculate the ratio 



x = — and call it the “index of cohesion”. 

7* 

Estimates of the higher I. C.’s can be made in the same way. 

One can show that 

E(7k) = 7k— 17. or E(y k ) = 7kj - 7i • 

Differences between these estimates of 7 k will be due to the various kinds of cohesion in the text. 

One application of these relations occurs in the analysis of fractionating* systems, where as 
a preliminary to enciphering the text is expressed as a product of components and each component 
is enciphered separately, and then the cipher text is recombined into ordinary letters. For in- 
stance, each of c = 25 letters may be expressed as a two digit number, where the first digit is 
0, 1, 2, 3, or 4 and the second is 5, 6, 7, 8, or 9. The I. C. of the combined text is the product 
of the I. C.’s of the fractions, as shown by an argument similar to that for digraphs. 



•In this connection, the following works will be of interest: 

TM 32-220, “Basic Cryptography,” Dept, of the Army, April 1950 pp. 155-163 

“Military Cryptanalysis, Part IV” by Wm. F. Friedman OCSigO, Washington, 1941, pp. 144-184 

“Elementary Cryptanalysis,” by Helen F. Gaines, American Photographic Publishing Co., Boston, 1944, pp. 
209-212 
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APPENDIX II 

A Proof that the Expected Cross I. C. is 1 

As in section VIII the relative frequencies of one text are p t and those for other are q ; , where 
c c c c 

2 pi = l and 2 qi-1. We will use P=c 2 pi* and Q = c 2 q s * for the I.C.’s of the two 
i=l i =1 i = 1 i=l 

c 

texts. We wish to predict the value of £ =c 2 piqi. 

i = l 

To do this we imagine the values of the q/s permuted among themselves and observe what happens 
to £. That is, we imagine another sample of text replacing the second, a sample for which the 
frequency numbers are the same but not necessarily for the same letters. The I. C. of the new 
sample is necessarily the same as that of the original, since the I. C. does not depend on which 
letters are frequent. The average of the £ over all possible permutations of the qt’s will be seen 
to be 1. That is, we will calculate 

E(£)=^2 £, where the p under the summation sign means the sum over all the permutations 
c! p 

of the q’s, c! in number. Thus 
E(£)=^-2C 2 p,qi l 



p i =1 



where qi 1 is one of the q’s, depending on the permutation. Each q will appear with a given sub- 
script i exactly (c— 1)! times. 

c 

Thus 2q i 1 = (c— 1)1 2 qi‘=(c— 1)1, 

p i = 1 

C 1 C- c 

and E(£) = -A- 2 p i 2q i i = 7 - _ - T , 2 pi(c-l)!= 2 pi = l. 

i = l p i = i i = l 

Notice that this result is independent of the roughness of either distribution. 

To find the variance of £ we evaluate 



E(£‘) = - r 2 
C - P 



c 2 pi qi 
i = l 



1 c 
= V 2C' 2 

c! p i = l j = l 



2 Piqi'PiqiS 



c* c c 

== ~ r 2 2 p,Pi 2 qi 1 qi 1 

■ i = l j = l p 

The term 2q i 1 q i 1 will depend on whether ij/j or i =j. For i^j we have 
P 



2 q i 1 q i l = (c-2)! 2 2 q^j 

p i = l j^i 



c c 

= (c— 2)1 2 qi 2 qj = (c-2)! 2 qi(l-qO 
i = l j^i i = 1 
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— (c — 2) ! 


c c 

2 qi— 2 qi 1 


= (c— 2)! 


f \ 

i-Q 




li-1 i*l J 




c 

> J 



For i = j we have 

2q/=(c-l)! 2 q,*-(c-l)A 
p i = 1 c 
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APPENDIX III 

The Standard Deviation of the I.C. 



1. In Kullback’s “Statistical Methods in Cryptanalysis”, Appendix D, page 151, the standard 
c 

deviation of <t>= 2 fi(f 4 — 1) is derived for a universe with prescribed probabilities p : . For 
i=l 

c c c 

convenience the abbreviations S* = 2 p t J , S»= 2 pi 1 , and S 4 = 2 pt 4 are made. The 

i = 1 i = 1 i = 1 

result is then obtained, «r*($) =4M*(S»— Si')+2M*(— 6S»+St+5S**)+2M(4S|— S*— 3Si l ). 

For the flat universe with pi="=pj we have 

C 11 C 11 C 11 

s ‘=fir-;rThen 



The relation between <t> and & is that 6 = 



M(M-l) j Z t 



2 f,(f,-l) = 



M(M-l) 






Therefore <r*(5) = 



c» 2(e— 1) 

M*(M-l)* ff M(M— 1) ' 



The relation between 5 and 7 is that 7 = 1 5 + w ^ ence 



<r’(7) = ( 



M — 1 
M 



)* v* (c 1 ). 



2. The comparison IC. is l = j The numerator g is binomially distributed with pro- 
babilities p=— andq — 1 — — of success and failure, assuming the random universe. Then 
c c 



c — 1 



c — 1 



< 7 *(g) = Npq = N— . Therefore a*(i) = <r*(g) = N - 

Thus E«')-4- 1 p, I p,(,-2|l(l-S)+|- j Pl >(c-1)4 

1=1 J^l C C ’ 1=1 ** 



Q 



= -^f( c -2) ! d-^) .Z. P« .2 Pj +^(c-l)!^ .2 p t * 



/«_i\ 



c i=i j^r* ' c! ' ' c i=i 

(c— Q)(c-P) QP _ c-P-Q+QP 
(c-l)c c c — 1 
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Now by definition <r*($) = E(f J ) -[E(f)]‘ 

c-P-Q+QP 1 1-P-Q+QP 
c — 1 c — 1 



(P-1)(Q-1) 
c— 1 
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APPENDIX IV 

Proof of the Formula of Section X 



The probability pi of the ith letter occurring in the sum is pi = 2 qj ri_ j. 

j 

Then 7 =c 2 pi*=c 2 2 2 qjq k r,_jri_ k . 
i i j k 

We compute the expected value of 7 by making all substitutions in turn one of the two streams 
and averaging the results. 

E(y) —~~j~ 2 2 2 2 qjq k r,_,r,_ k =-^-2 2 2 qjq k 2 rr-jr,-* 
c - perm i j k c ’ 



i j k 



perm 

r 



But 2 r,rt = (c-2)! 2 2 rjt =(c-2)!(l-2r,*). 
perm s^t s 

r 



Therefore E( 7 ) = -^-2 



2 qj* 2 r ,*+2 2 qjq k 2 r s _ jri_ k 

j perm j^k perm 
r 



c! 



c— 1 
1 



(c-1) 



i t 

Tl 73 , 1 



C C 





r 




-2q s *)(l- 


2 r k »)(c- 2 )! 


j 


k 


J 


7i 7* i 


’ 

7i7i 




c c 


C* 

J 





c— i 
1 



(C _1)J4^ +C - T ,- T , + Juii 



c— 1 



7l7* — 7l~ 7*+C 



(7i-l) (7i~l) , Ofljl _ 1 , Mi 



C — 1 



c— 1 



c — 1 
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APPENDIX V 

Proof of the Mixture Formula 



We have assumed k sources of text, each with its own frequency distribution. Suppose the 

ith distribution to be pn, pu, p^, — with /Si =c 2 p tJ *— 1, i-1, 2, , k. 

3 = 1 



k 

Suppose that the relative proportions that the k sources are used are r x : r*: . . . : r t with 2 ri=*l. 

1 



Then if qj is the proportion of a letter in the resulting mixture. 
The I.C. of the mixture is 



7 =c 2 qj*=c 



c 

2 



k 

2 npij 



c 

=C 2 



k 

2 



j=l 



j = l 



l i = l 



j==l i = 1 



k k c k k 

= 2 2 riTfcC 2 PijPhj= 2 2 rir h £ ih 

i = l h = l j = l i = 1 h =1 



qi=ripii+r,p,i+. . . +r k p k j. 



k 

2 r £ r h pu p hi . 
h =1 



where £i h =c 2 puPhi is the cross I.C. of the ith and hth sources. The special case $u = yi is 
j = l 

the I.C. of the ith source. 

k k 

Now the I.C. becomes 7 = 2 2 rir h fih+ 2 ri* 7 i. 

i = l h?*i i=l 

If the k sources are unrelated then E(£i k ) - 1 and we have 

k k k k k 

E( 7 )= 2 2 r 4 r h + 2 r £ 7 i= 2 2 r t r h + 2 ri*(l+ 7 i-l) 

i = l hj»fl i = 1 i = l h^i i = l 



k k k 

= 2 2 r s r h + 2 ri*/Si 

i = l h = 1 i = l 



where /Si = 7 i — 1 is the bulge. In this last algebraic manipulation we have removed the restriction 
on the range of the summation index h by changing from I.C. to bulge. Without the restriction 

k k k k k 

we have 2 2 rir h = 2 7 S 2 r h = l so that E( 0 ) =E( 7 ) — 1= 2 r s * /Si. 

i = l h = 1 i = l h=l i = 1 
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APPENDIX VI 



I. C. DISTRIBUTION AND TABLES 



1. Distribution of the 7 or a I. C. (x* method). 

2. Distribution of the Cross I. C. 

3. Distribution of 1 1. C. 

4. Tables of I. C. 

5. Short Cumulative Poisson Table 

6 . Short Table of Logarithms of Factorials 
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1. Distribution of the y or 6 I. C. (x* method). 

If we have a frequency count fi, f>, , f. in a c letter alphabet where 



c 

2 fi=N,then 
i = l 









'■-4 



=N(y-l), where y 2 fi* is the y I.C. 

N i = l 



The distribution of x 1 is known asymptotically. This case has c— 1 degrees of freedom. 
P( 7 5X)=P(x^N[X-11) . 

1— c 

2 2 N(X-l) c _ 3 t 



c— 1 



t * e * dt 



0 (See Cramer p. 234.) 



N f*zLl 
r N i 2 



c— 1 



J 6 



u 






-I 



N- 



(X-l) c— 3 
2 ’ 2 



E(t) =1+ =7o- 



c — 3 
~2~ 

e -u du 




Let N(7 -1 )z £± L 

N_ V2(c-1) 

/WEh 
V N 

Then P(SjSS«) -Pfr^o+S,,*) 

c — 1 . o \/2(c-l) 



= P 



7^1- 



N 



-Sr- 



N 
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rc-1+Sov^T) [^ir) 

2 



r j^i) 

2 



I 






c— 3 
2 



N 



If we take ^ _ ^ -? 1 (Error less than 1% for N^50). The above is reasonably accurate 
and can be tabulated for values of sigraage S without regard for sample size. 



Values for the above are tabulated by translation from Pearson's Tables of the Incomplete 
Gamma Function. 



In addition, 



l-T 



c — y 
N — 1 



N 




r ’ 

c 


N-l 

t * 


7“ 


N-l 

k * 



For each particular % we get a corresponding 70 such that 



N 




r 

c 


MN-D-c 


N-l 

b J 


7« — 


N-l 


or 70- 



Notice that 



P(«<«o)=P 



N c . N c 

N-l 7 N-l “ N — 1 70 N— 1 ' 

k j 



and this is merely P( 7 ^ 7 0 ). 

Therefore the table is valid for 8 as well as 7 . 

As a rough rule of thumb these tables are applicable if each category has more than 5 tallies. 



2. Distribution of the Cross I. C. 

The cross I.C. £ is a measure of the fit between two frequency counts. Let one of these be f s 
c c 

and the other gi, with 2 f; = M and 2 g t = N and c = the number of letters in the alphabet. 
i=l i=l 



Then £ = ^«L. 



Let P= and Q = be the I.C.’s. 



Then o-*(£) =— — — — is the variance, and 
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S(() = — — — = (f 1) 'n/c 1 _ jg th e sigmage. 

* V(P-1)(Q-1) 

The distribution of S(|) has the fortunate property of being approximately independent of 
the sample size. This makes it feasible to tabulate its distribution. 

The table, each entry of which is the probability of getting a certain sigmage or higher, was 
made by translating from Pearson’s Table of the Incomplete Beta Function by the formula 

t 1 )- 



2 

See series of reports on Gleason’s “Distribution of the Correlation Coefficient”. 



3. Distribution of i I. C. 



i 



ch 
“ N 



where 



and 



h = Number of hits, 

N = Amount of overlap, 
c = Number of categories (letters in alphabet), 
p = l/c = Probability of individual hit occurring. 



The distribution of t is exactly binomial, and 

N 



Then 



and 



P (h = X) = 



1 , 1x N-X 



i N 
P(h^X) — 2 

cN j = X 



P(i>k)=P 



N 

3 J 



(c-1) 



N-j 



fhi Nk ] 


- 1 5 


'N' 


c 

. J 


cN j =Nk 


. i , 



(c-1) 



N-j 



Values for this can be found in appropriate binomial distribution tables with n = N, r - 



Nk 



and p=— . 
c 

These values can be very well approximated for cases where N>10c by entering Poisson 
Table II with a = -^-, X =k. 

Tables for the distribution of t are therefore not presented herein. 
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k 


Normal 

P(S2:k) 


Cross I. C. 
P[S(f)^k] 


Gamma I. C. 
P[S(y)^k] 


C = 10 


C =26 


C =30 


C =32 


O 

II 

H-* 

o 


C =15 


C =17 


C =19 


00.0 


1 


.500 


.500 


.500 


.500 


.437 


.450 


.453 


.456 


00.1 


.460 


.464 


.461 


.462 


.461 


.399 


.411 


.414 


.417 


00.2 


.421 


.427 


.423 


.421 


.423 


.363 


.374 


.377 


.380 


00.3 


.382 


.392 


.385 


.384 


.385 


.329 


.339 


.342 


.344 


00.4 


.345 


.357 


.349 


.349 


.348 


.297 


.307 


.309 


.311 


00.5 


.308 


.323 


.313 


.315 


.312 


.268 


.276 


.278 


.279 


00.6 


.274 


.290 


.280 


.278 


.248 


.240 


.247 


.249 


.250 


00.7 


.242 


.258 


.248 


.247 


.246 


.215 


.221 


.222 


.223 


00.8 


.212 


.228 


.217 


.218 


.216 


.192 


.197 


.198 


.198 


00.9 


.184 


.200 


.189 


.188 


.188 


.171 


.174 


.175 


.176 


01.0 


.159 


.173 


.164 


.163 


.162 


.152 


.154 


.155 


.155 


01.1 


.136 


.149 


.140 


.140 


.139 


.135 


.136 


.136 


.136 


01.2 


.115 


.126 


.119 


.119 


.118 


.119 


.120 


.120 


.120 


01.3 


.0968 


.106 


.100 


.0989 


.0990 


.105 


.105 


.105 


.104 


01.4 


Ki}:f ’tte 


.0871 


.0830 


.0826 


.0824 


.0927 


.0918 


.0914 


.0910 


01.5 


.0668 


.0706 


.0682 


.0686 


.0678 


.0815 


.0801 


.0796 


.0790 


01.6 


.0548 


.0563 


.0555 


.0550 


.0552 


.0715 


.0697 


.0691 


.0684 


01.7 


.0446 


.0439 


.0446 


.0446 


.0445 


.0627 


.0605 


.0598 


.0591 


01.8 


.0359 


.0333 


.0354 


.0358 


.0354 


.0548 


.0524 


.0516 


.0509 


01.9 


.0287 


.0247 


.0278 


.0284 


.0278 


.0478 


.0452 


.0445 


.0437 


02.0 




.0177 


.0214 


.0216 


.0215 


.0417 


.0390 


.0382 


.0374 


02.1 


.0179 


.0121 


.0163 


.0167 


.0165 


.0363 


.0336 


.0328 


.0320 


02.2 


.0139 


.00795 


.0122 


.0127 


.0125 


.0315 


.0288 


.0280 


.0273 


02.3 


.0107 


.00488 


.00903 


.00925 


.00934 


.0274 


.0247 


.0239 


.0232 


02.4 


.0082 


.00273 


.00654 


.00682 


.00711 


.0237 


.0211 


.0204 


.0197 


02.5 


.0062 


.00140 


.00465 


.00494 


.00514 


.0205 


.0180 


.0173 


.0167 


02.6 


.0047 


.000602 


.00324 


.00351 


.00364 


.0178 


.0153 


.0147 


.0141 


02.7 


.0035 


.000194 


.00220 


.00237 


.00253 


.0153 


.0131 


.0124 


.0119 


02.8 


.0026 


.0000447 


.00146 


.00161 


.00172 


.0132 


.0111 


.0105 


.0100 


02.9 


.0019 


.0000037 


.000949 


.00109 


.00118 


.0114 


.00941 


.00888 


.00841 
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k 


Gamma I. C. 
P[S(r)2:k] 


C =20 


C =21 


C =23 


O 

II 

Id 

Cl 


C =26 


C =30 


8 

II 

U 


C =35 


a 

ii 

6 


00.0 


.457 


.458 


.459 


.462 


.463 


.465 


.466 


.468 


.469 


00.1 


.418 


.419 


.421 


.423 


.424 


.426 


.427 


.429 


.431 


00.2 


.381 


.382 


.384 


.385 


.386 


.388 


.389 


.391 


.393 


00.3 


.345 


.346 


.348 


.349 


.350 


.352 


.353 


.355 


.356 


00.4 


.312 


.313 


.314 


.315 


.316 


.318 


.319 


.320 


.322 


00.5 


.280 


.281 


.282 


.284 


.284 


.286 


.287 


.287 


.289 


00.6 


.251 


.252 


.253 


.254 


.254 


.255 


.256 


.257 


.258 


00.7 


.224 


.224 


.225 


.226 


.226 


.227 


.228 


.229 


.230 


00.8 


.199 


.199 


.200 


.201 


.201 


.202 


.202 


.203 


.203 


00.9 


.176 


.176 


.177 


.177 


.178 


.178 


.078 


.179 


.179 


01.0 


.155 


.156 


.156 


.156 


.156 


.156 


.157 


.157 


.157 


01.1 


.137 


.137 


.137 


.137 


.137 


.137 


.137 


.137 


.137 


01.2 


.120 


.120 


.120 


.120 


.120 


.119 


.119 


.119 


.119 


01.3 


.104 


.104 


.104 


.104 


.104 


.104 


.104 


.103 


.103 


01.4 


.0909 


.0908 


.0905 


.0903 


.0902 


.0897 


.0896 


.0893 


.0889 


01.5 


.0789 


.0788 


.0784 


.0781 


.0780 


.0773 


.0772 


.0768 


.0763 


01.6 


.0683 


.0681 


.0676 


.0673 


.0671 


.0664 


.0662 


.0658 


.0652 


01.7 


.0589 


.0587 


.0582 


.0578 


.0576 


.0568 


.0566 


.0562 


.0555 


01.8 


.0507 


.0504 


.0499 


.0495 


.0493 


.0485 


.0482 


.0478 


.0471 


01.9 


.0435 


.0432 


.0427 


.0423 


.0420 


.0412 


.0410 


.0405 


.0398 


02.0 


.0372 


.0370 


.0364 


.0360 


.0358 


.0349 


.0347 


.0342 


.0335 


02.1 


.0318 


.0315 


.0309 


.0305 


.0303 


.0295 


.0292 


.0288 


.0281 


02.2 


.0270 


.0268 


.0262 


.0258 


.0256 


.0248 


.0246 


.0241 


.0235 


02.3 


.0230 


.0227 


.0222 


.0218 


.0216 


.0209 


.0206 


.0202 


.0196 


02.4 


.0196 


.0192 


.0187 


.0184 


.0182 


.0175 


.0172 


.0168 


.0163 


02.5 


.0166 


.0162 


.0158 


.0154 


.0152 


.0146 


.0143 


.0140 


.0135 


02.6 


.0139 


.0137 


.0132 


.0129 


.0127 


.0121 


.0119 


.0116 


.0111 


02.7 


.0117 


.0115 


.0111 


.0108 


.0106 


.0101 


.00988 


.00956 


.00914 


02.8 


.00983 


.00964 


.00928 


.00899 


.00885 


.00834 


.00816 


.00788 


.00749 


02.9 


.00824 


.00807 


.00774 


.00747 


.00735 


.00689 


.00672 


.00647 


.00612 
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& 



k 


Normal 

P(S£k) 


Cross I. C. 
PIS(*)£kj 


C = 10 C =26 C =30 


03.0 


.00135 


.000597 .000684 


03.1 




.000365 .000441 


03.2 


.00069 


.000215 .000277 


03.3 




.000122 .000169 


03.4 




.0000663 .0000935 


03.5 




.0000344 .0000531 


03.6 


.00016 


.0000169 .0000289 


03.7 




.0000078 .0000141 


03.8 


HUJJ/I I 


.0000033 .0000069 


03.9 




.0000013 .0000032 


04.0 




.0000005 .0000014 


04.1 




.0000001 .0000006 


04.2 


.0000134 


.0000002 


04.3 


.0000086 


0000001 


04.4 


.0000055 




04.5 


1 




04.6 






04.7 


.0000014 




04.8 






04.9 


.0000005 




05.0 






05.1 


.0000001 




05.2 


.0000001 




05.3 


.0000001 




05.4 


.0000000 





05.5 

05.6 

05.7 

05.8 

05.9 



7 



Gamma I. C. ; ; 

P[S(?)2:k] 13 



C = 32 


o 

II 

o 


C = 15 


O 

II 


C = 19 


.000754 


.00980 


.00797 


.00748 


.00706 


.000486 


.00842 


.00673 


.00629 


.00591 


.000305 


.00723 


.00568 


.00528 


.00493 


.000186 


.00621 


.00479 


.00443 


.00412 


.000109 


.00532 


.00403 


.00370 


.00343 


.0000621 


.00455 


.00339 


.00309 


.00285 


.0000339 


.00390 


.00284 


.00258 


.00237 


.0000177 


.00333 


.00238 


.00215 


.00196 


.0000088 


.00284 


.00200 


.00179 


.00162 


.0000041 


.00243 


.00167 


.00149 


.00134 


.0000019 


.00207 


.00139 


.00123 


.00111 


.0000008 


.00176 


.00116 


.00102 


.000911 


.0000007 


.00150 


.000970 


.000847 


.000750 


.0000001 


.00128 


.000807 


.000700 


.000616 




.00109 


.000671 


.000579 


.000506 




.000923 


.000558 


.000477 


.000414 




.000783 


.000463 


.000393 


.000339 




.000664 


.000384 


.000324 


.000277 




.000564 


.000318 


.000266 


.000227 




.000477 


.000263 


.000219 


.000185 




.000405 


.000217 


.000179 


.000151 




.000342 


.000180 


.000147 


.000123 




.000290 


.000148 


.000120 


.0000996 




.000245 


.000122 


.0000984 


.0000809 




.000207 


.000101 


.0000804 


.0000657 




.000175 


.0000828 


.0000657 


.0000532 




.000147 


.0000682 


.0000535 


.0000431 




.000124 


.0000560 


.0000437 


.0000349 




.000105 


.0000460 


.0000355 


.0000282 




.0000884 


.0000377 


.0000289 


.0000227 
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c > - 

S! 










Gamma I . C . 










c 

§ 


M 

N i 


k 










P [ S ( 7 )> k ] 










*5 

h 


t J 






















t i 


M 
► • 




C =20 


C =21 




C =25 


C = 26 


C =30 


C =32 


C =35 


C =40 


!! 


* 3 


03.0 


.00690 


.00674 


.00644 


.00620 


.00609 


.00568 


.00553 


.00530 


.00499 


Ei 


l 


03.1 


.00576 


.00562 


.00585 


.00513 


.00503 


.00467 


.00453 


.00433 


.00406 


r 




03.2 


.00480 


.00468 


.00444 


.00424 


.00415 


.00383 


.00371 


.00353 


.00329 






03.3 


.00400 


.00389 


.00367 


.00350 


.00342 


.00313 


.00302 


.00287 


.00266 






03.4 


.00332 


.00322 


.00303 


.00288 


.00280 


.00255 


.00246 


.00232 


.00214 






03.5 


.00276 


.00267 


.00250 


.00236 


.00230 


.00208 


.00200 


.00188 


.00172 






03.6 


.00228 


.00220 


.00205 


.00194 


.00188 


.00169 


.00162 


.00152 


.00138 






03.7 


.00189 


.00182 


.00169 


.00158 


.00154 


.00137 


.00131 


.00122 


.00110 






03.8 


.00156 


.00150 


.00138 


.00129 


.00125 


.00111 


.00106 


.000981 


.000381 






03.9 


.00128 


.00123 


.00113 


.00105 


.00102 


.000895 


.000850 


.000786 


.000702 






04.0 


.00106 


.00101 


.000925 


.000857 


.000826 


.000721 


.000683 


.000628 


.000557 






04.1 


.000868 


.000828 


.000754 


.000696 


.000670 


.000580 


.000547 


.000501 


.000442 




$ 


04.2 


.000712 


.000677 


.000614 


.000564 


.000542 


.000466 


.000438 


.000399 


.000349 






04.3 


.000584 


.000554 


.000500 


.000457 


.000438 


.000373 


.000350 


.000317 


.000275 






04.4 


.000478 


.000452 


.000406 


.000369 


.000353 


.000299 


.000279 


.000252 


.000217 






04.5 


.000390 


.000368 


.000329 


.000298 


.000284 


.000239 


.000222 


.000199 


.000170 






04.6 


.000319 


.000300 


.000266 


.000240 


.000228 


.000190 


.000176 


.000157 


.000133 






04.7 


.000260 


.000244 


.000215 


.000193 


.000183 


.000151 


.000140 


.000124 


.000104 






04.8 


.000212 


.000198 


.000174 


.000155 


.000147 


.000120 


.000110 


.0000976 


.0000814 






04.9 


.000172 


.000160 


.000140 


.000125 


.000118 


.0000954 


.0000872 


.0000767 


.0000635 






05.0 


.000140 


.000130 


.000113 


.0000997 


.0000939 


.0000754 


.0000688 


.0000602 


.0000494 






05.1 


.000113 


.000105 


.0000907 


.0000797 


.0000750 


.0000596 


.0000541 


.0000470 


.0000383 






05.2 


.0000919 


.0000849 


.0000729 


.0000636 


.0000597 


.0000471 


.0000427 


.0000368 


.0000297 






05.3 


.0000743 


.0000685 


.0000584 


.0000509 


.0000476 


.0000370 


.0000335 


.0000287 


.0000229 






05.4 


.0000601 


.0000552 


.0000469 


.0000405 


.0000378 


.0000292 


.0000263 


.0000223 


.0000177 






05.5 


.0000484 


.0000444 


.0000375 


.0000322 


.0000300 


.0000229 


.0000206 


.0000174 


.0000136 




o 

hrt 


05.6 


.0000392 


.0000358 


.0000299 


.0000256 


.0000238 


.0000181 


.0000161 


.0000134 


.0000106 






05.7 


.0000316 


.0000287 


.0000240 


.0000203 


.0000188 


.0000141 


.0000125 


.0000105 


.0000080 




M 


06.8 


.0000255 


.0000230 


.0000190 


.0000161 


.0000149 


.0000111 


.0000098 


.0000080 


.0000061 




Z 

> 


05.9 


.0000204 


.0000185 


.0000152 


.0000128 


.0000118 


.0000087 


.0000076 


.0000063 


.0000047 










ORIGINAL 



7 



Normal 

P(S2:k) 



C=10 



Cross I. C. 
P[S(€) 2=k] 



C =26 C 



Gamma I. C. 
P[S(7)*k] 



C =32 




C=10 


o 

II 

cn 


o 

II 

1-* 


C = 19 


.0000746 


.0000310 


.0000235 


.0000184 


.0000628 


.0000254 


.0000192 


.0000148 


.0000528 


.0000208 


.0000155 


.0000119 


.0000444 


.0000170 


.0000126 


.0000096 


.0000373 


.0000139 


.0000102 


.0000077 


.0000314 


.0000113 


.0000082 


.0000062 


.0000264 


.0000093 


.0000067 


.0000050 


.0000221 


.0000076 


.0000054 


.0000040 


.0000185 


.0000062 


.0000044 


.0000032 


.0000156 


.0000051 


.0000036 


.0000026 


.0000131 


.0000041 


.0000028 


.0000020 


.0000110 


.0000034 


.0000023 


.0000016 


.0000092 


.0000027 


.0000019 


.0000013 


.0000076 


.0000022 


.0000015 


.0000010 


.0000065 


.0000018 


.0000012 


.0000008 


.0000054 


.0000015 


.0000009 


.0000007 


.0000045 


.0000012 


.0000008 


.0000005 


.0000038 


.0000010 


.0000006 


.0000004 


.0000032 


.0000008 


.0000005 


.0000003 


.0000027 


.0000007 


.0000004 


.0000003 


.0000022 


.0000006 


.0000003 


.0000002 


.0000020 


.0000005 


.0000003 


.0000002 


.0000016 


.0000003 


.0000002 


.0000001 


.0000014 


.0000003 


.0000001 


.0000001 


.0000011 


.0000002 


.0000001 


.0000001 


.0000010 


.0000002 


.0000001 


.0000001 


.0000007 


.0000001 


.0000001 




.0000007 


.0000001 


.0000001 




.0000006 


.0000001 


.0000001 




.0000005 


.0000001 
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.0000165 

.0000132 

.0000105 

.0000085 

.0000068 

.0000055 

.0000044 

.0000036 

.0000028 

.0000022 

.0000018 

.0000015 

.0000012 

.00G0010 

.0000008 

.0000006 

.0000004 

.0000002 

.0000002 

.0000002 

.0000002 

.0000000 



.0000148 

.0000118 

.0000095 

.0000076 

.0000060 

.0000048 

.0000039 

.0000030 

.0000025 

.0000019 

.0000016 

.0000012 

.0000010 

.0000007 

.0000007 

.0000004 

.0000004 

.0000003 

.0000002 

.0000002 

.0000002 

.0000001 

.0000001 

.0000001 

.0000001 

.0000001 



.0000121 

.0000096 

.0000076 

.0000061 

.0000048 

.0000038 

.0000030 

.0000024 

.0000018 

.0000015 

.0000012 

.0000009 

.0000008 

.0000006 

.0000005 

.0000003 

.0000003 

.0000002 

.0000002 

.0000001 

.0000001 

.0000001 

.0000001 

.0000001 




.0000093 


.0000067 


.0000058 


.0000048 


.0000036 


.0000073 


.0000053 


.0000045 


.0000037 


.0000028 


.0000058 


.0000040 


.0000035 


.0000028 


.0000020 


.0000045 


.0000032 


.0000026 


.0000022 


.0000016 


.0000036 


.0000026 


.0000021 


.0000017 


.0000012 


.0000028 


.0000020 


.0000018 


.0000014 


.0000010 


.0000021 


.0000014 


.0000012 


.0000010 


.0000006 


.0000017 


.0000012 


.0000008 


.0000007 


.0000005 


.0000014 


.0000008 


.0000008 


.0000006 


.0000004 


.0000012 


.0000006 


.0000006 


.0000006 


.0000002 


.0000008 


.0000004 


.0000006 


.0000004 


.0000002 


.0000006 


.0000004 


.0000004 


.0000002 


.0000002 


.0000006 


.0000002 


.0000002 


.0000002 


.0000000 


.0000004 


.0000002 


.0000002 


.0000002 




.0000002 


.0000002 


.0000002 


.0000001 




.0000002 


.0000002 


.0000000 


.0000001 




.0000002 


.0000000 




.0000001 




.0000002 






.0000001 




.0000000 
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5. Short Cumulative Poisson Table P(x,a). 





a 5 


7 


10 


12 


13 


15 


17 


20 


X 

5 


.56 


.83 


.971 


.9924 


.9963 


.99914 


.99982 


.99983 


6 


.38 


.70 


.933 


.980 










7 


.24 


.55 


.87 


.954 


.974 


.9924 


.9979 


.99975 


8 


.13 


.40 


.78 


.910 










9 


.068 


.27 


.67 


.841 


.900 


.963 


.987 


.9979 


10 


.032 


.17 


.54 


.76 










11 


.014 


.10 


.42 


.65 


.75 


.88 


.951 


.989 


12 


.055 


.053 


.30 


.54 










13 


.0020 


.027 


.21 


.42 


.54 


.73 


.86 


.961 


14 


.00070 


.013 


.14 


.32 










15 


.00023 


.0057 


.083 


.23 


.32 


.53 


.72 


.895 


16 


.000069 


.0024 


.049 


.16 










17 


.000020 


.00096 


.027 


.10 


.16 


.36 


.53 


.78 


18 


.000005 


.00036 


.014 


.063 










19 


.000001 


.00013 


.0072 


.037 


.070 


.18 


.35 


.62 


20 




.000044 


.0035 


.021 








.53 


21 




.000014 


.0016 


.012 


.025 


.083 


.19 


.44 


22 




.000005 


.00070 


.0061 










23 




.000001 


.00030 


.0030 


.0076 


.033 


.095 


.28 


24 






.00012 


.0015 










25 






.000047 


.00069 


.0020 


.011 


.041 


.16 


26 






.000018 


.00031 










27 






.000006 


.00013 


.00045 


.0033 


.015 


.078 


28 






.000002 


.000056 










29 






.000001 


.000023 


.000089 


.00086 


.0050 


.034 



30 .000009 
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6. Short Table of Logarithms of Factorials. 



N 


log N! 


N 


log N! 


2 


.30 


100 


158 


3 


.78 


200 


375 


4 


1.4 


300 


614 


5 


2.1 


400 


868 


6 


2.9 


500 


1134 


7 


3.7 


600 


1408 


8 


4.6 


676 


1621 


9 


5.6 


700 


1689 


10 


6.6 


800 


1977 


20 


18.4 


900 


2270 


26 


26.6 


1000 


2568 


30 


32.4 


1024 


2640 


32 


35.4 


2000 


5736 


36 


41.6 


3000 


9131 


40 


47.9 


4000 


12673 


50 


64.5 


5000 


16326 


60 


81.9 






70 


100. 






80 


119. 






90 


138. 
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